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Abstract 

Haploinsufficiency or mutation of 76X7 is largely responsible for the etiology of physical malformations in individuals with 
velo-cardio-facial/DiGeorge syndrome (VCFS/DGS/22q1 1.2 deletion syndrome). TBXl encodes a transcription factor protein 
that contains an evolutionarily conserved DNA binding domain termed the T-box that is shared with other family members. 
All T-box proteins, examined so far, bind to similar but not identical consensus DNA sequences, indicating that they have 
specific binding preferences. To identify the TBXl specific consensus sequence. Systematic Evolution of Ligands by 
Exponential Enrichment (SELEX) was performed. In contrast to other TBX family members recognizing palindrome 
sequences, we found that TBXl preferentially binds to a tandem repeat of 5'-AGGTGTGAAGGTGTGA-3'. We also identified a 
second consensus sequence comprised of a tandem repeat with a degenerated downstream site. We show that three 
known human disease-causing TBXl missense mutations (F148Y, H194Q and G310S) do not alter nuclear localization, or 
disrupt binding to the tandem repeat consensus sequences, but they reduce transcriptional activity in cell culture reporter 
assays. To identify TtixZ-downstream genes, we performed an in silico genome wide analysis of potential c/s-acting elements 
in DNA and found strong enrichment of genes required for developmental processes and transcriptional regulation. We 
found that TBXl binds to 19 different loci In vitro, which may correspond to putative c/s-acting binding sites, in situ 
hybridization coupled with luciferase gene reporter assays on three gene loci, FgfS, Bmper, Otog-MyoD, show that these 
motifs are directly regulated by TBXl in vitro. Collectively, the present studies establish new insights into molecular aspects 
of TBXl binding to DNA. This work lays the groundwork for future in vivo studies, including chromatin immunoprecipitation 
followed by next generation sequencing (ChlP-Seq) to further elucidate the molecular pathogenesis of VCFS/DGS. 
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Introduction 

T-box genes encode a large family of transcription factors tliat 
are required during embryonic development. Brachyury, the 
founding member of this family was first identified due to the 
presence of a short tail phenotype found in heterozygous mice and 
lack of axial development with early lethality in null mutant 
embryos [1-3]. Brachyury has an evolutionarily conserved DNA 
binding domain, termed the T-box, and can regulate transcription 
of a reporter gene in cell culture [4,5]. Since the original discovery 
of Brachyury, nineteen different T-box genes have been identified 
and are evolutionarily conserved from flies to humans [6-8] . Most 
T-box genes are dispersed on dilferent chromosomes. They are 
classified based upon sequence homology to each other and are 
members of five different subfamilies [6-8]. As for Brachyury, 
most T-box transcription factors are required for embryonic 
development and many are sensitive to altered gene dosage for 
biological function. The T-box family has received notoriety as 
mutations have been identified in the etiology of several congenital 
malformation disorders. For example, mutations in TBX3 lead to 
Ulnar Mammary Syndrome, and mutations in TBX5 cause Holt- 
Oram Syndrome, both of these presenting disease specific limb 
and heart defects [9,10] among others [11]. 



All T-box family members share an evolutionarily conserved, 
DNA binding domain comprising approximately 1 80 amino acids. 
The Brachyury protein binds as a homodimer to a palindrome of 
two AGGTGTGA "half-sites" [4]. Brachyury can also bind as a 
monomer to a single half-site, but with 20 fold lower binding 
affinity [12]. Molecular biological methods have been used to 
identify the consensus sequence for other T-box proteins and most 
can bind to the Brachyury palindrome or half-site [4,12,13], but 
they have their own preferential binding site, as in the case of 
TBX5, TBX6, TBX15 and TBX18 [14-16]. Among other T-box 
proteins tested, Brachyury, TBX 15 and Eomes can bind to a 
direct repeat [16-18]. 

The TBXl gene encodes a T-box transcription factor that maps 
to the 22ql 1 .2 region, which is hemizygously deleted in individuals 
with velo-cardio-facial syndrome and DiGeorge syndrome 
(VCFS/DGS; MIM #: 192430/188400). Since most have a 
typical 3 million base pair deletion, it is also referred to as 22ql 1.2 
deletion syndrome (22qllDS). Historically, TBXl was found to 
bind to the palindromic T-site, but unlike for other transcription 
factors, it did not significantly activate nor repress transcription of 
a reporter construct [12]. Heterozygous mutations in TBXl have 
been reported in rare non-deleted patients with related physical 
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defects to that of VCFS/DGS. It is believed that these are loss of 
function mutations resulting in haploinsufliciency [19-21]. 

As for other T-box genes, Thxl is required in a dose dependent 
manner for normal mouse embryogenesis [22-24]. Inactivation of 
Tbxl results in neonatal lethality and embryos have a cleft palate, 
abnormal inner ears, absent thymus, parathyroid glands and 
persistent truncus arteriosus [22-25]. Unbiased gene profiling on 
RNA from microdissected tissues [25-28] and candidate gene 
approaches, based upon similar knockout phenotypes, have been 
undertaken to identify downstream genes of Tbxl [25,26,28]. 
Among hundreds of genes identified, only eight direct ( l()\vnstr(;am 
transcriptional target genes were found, including Fgf8, F^IO, 
Htx2, Chd7, VefrS, Eyal and WntSa [29-36]. 

To expand the repertoire of direct transcriptional downstream 
target genes, we performed Systematic Evolution of Ligands by 
Exponential Enrichment (SELEX) to identify the mouse TBXl 
consensus site [4,37-40]. We found that TBXl binds to two 
different consensus sequences, one that is a perfect tandem repeat 
of the Brachyury half-site and the other that is an imperfect 
tandem repeat. TBXl can activate transcription of these novel 
sites in luciferase reporter assays in cell culture. Using these new 
consensus TBXl sites, we found that the TBXl mutations 
previously reported alter transcriptional activity. Next, we wanted 
to use the new consensus sequences to identify potential 
downstream transcriptional target genes. After performing an in 
silica genome wide search for these motifs, we tested 30 and 
validated 1 1 putative direct binding sites, including sites in the 
Fgf8, Bmper and Otog-MyoD genomic loci. These and others are 
strong candidates to be pursued as direct downstream targets in 
future by in vivo functional experiments. 

Materials and Methods 

Ethics Statement 

Animal studies were carried out in strict accordance with the 
recommendations in the Guide for the Care and Use of 
Laboratory Animals of the National Institutes of Health. The 
protocol was approved by the Albert Einstein College of Medicine 
Animal Institute Committee (Protocol Number: 2013-0405; 
Protocol Name: Mouse Models of 22qll Rearrangement Disor- 
ders). All embryo dissections were conducted after euthanizing 
mice by direct inhalation with COj. 

Recombinant GST-TBX1 Fusion Protein 

The T-box region (amino acids 90-303) of mouse Tbxl was 
PCR amplified from cDNA with the flanking restriction enzyme 
sites of EcoRI and Xhol. These sites were used to subclone the 
DNA fragment into the bacterial expression vector, pGEX4t3 (GE 
Healthcare), to generate a GST-TBXl ftision protein. The vector 
was transformed into BL2 1 (DE3)LysS competent cells (Stratagene) 
and grown on LB ampicillin agar plates. Colonies were picked and 
grown in 2XYT media, 10 mg/ml Amp, 1 M MgClv and 20% 
glucose. Cultures were grown at 29°C and protein expression was 
induced by the addition of 100 mM IPTG (isopropyl-beta-D- 
thiogalactopyranoside). After induction with IPTG, protein was 
detected via Coomassie blue staining and the fusion protein was 
subscrjucnth" purified with glutathione Sepharose 4B beads (GE 
Healthcare) and detected via western blot. The same protocol was 
followed when inducing the F148Y, H194Q.and G310S mutated 
TBXl proteins. 

In vitro Selection (SELEX) 

A 76-mer single-stranded library of oligonucleotides 5'- 
GTAACGTCGAGACGGAATTCGCGGCCGCNisCTCGAG- 



GATCCGTGCTCAGTCCCTATCG-3', where a random 18- 
mer sequence flanked by two 28-mer flanking fixed sequences used 
for sequencing, was synthesized by Fisher Scientific (HPLC 
purification) as previously described [41]. The second strands 
were generated using Klenow enzyme (NEB) at 25°C for 3 hrs 
with the primer 5'-CGATAGGGACTGAGCACGGATCCCT- 
3'. The dsDNA samples were separated on a 4.5% UltraPure 
Agarose 1000 gel (Invitrogen) and purified by Qiaquick Gel 
Extraction Kit (Qiagen). PCR was performed to amplify the 
dsDNA products. After six rounds of selection with recombinant 
GST-TBXl protein, the PCR products from round 0 (original 
dsDNA randomers), two, four and six were labeled with [a-'"P] 
dCTP (PerkinElmer, Cat# NEG513H250UC) by Taq DNA 
polymerase. Oligonucleotides from each round were captured 
using glutathione Sepharose beads (GE Healthcare). Each labeled 
round of oligonucleotides was tested via EMSA to determine the 
round with the highest enrichment. Amino acids 90-303 were 
digested from a plasmid containing full length Thxl cDNA (Thxl- 
pCDNA3. 1) and subcloned into the pGEX 4t3 vector for bacterial 
induction (Figure SI a). Protein induction by IPTG was detected 
via Coomassie blue staining as well as by western blot analysis. 
The SELEX procedure was carried out for six rounds, increment- 
ing the pool of oligonucleotides with the highest binding affinity 
with each subsequent round. The PCR products from round six 
were cloned into pSC-A vector for sequencing. In total, 60 
colonies were picked, plasmid DNA was extracted and subjected 
to Sanger sequencing (Einstein Genomics Core Lab). The 
sequences were aligned using the WebLogo program (http:// 
weblogo.berkeley.edu) and two motifs were generated. 

Western Blot 

Proteins that were induced by IPTG and visualized with 

Coomassie staining were also tested via western blot with specific 
antibodies. Proteins were denatured with 6x Laemmli loading 
buffer to 95"C for 5 minutt-s. Samples were then loaded onto a 
\0% acrylamide/bisacr)4amide gel for 1 hour at 120 volts and 
subsequently transferred onto a PVDF membrane (BioRad). 
Antibodies used were: 1°- rabbit polyclonal a mouse Tbxl 1:500 
(Zymed); rabbit polyclonal a GST 1:500 (Abeam); 2°- ECL 
donkey anti-rabbit IgG, horseradish peroxidase linked whole 
antibody 1:10,000 (Amersham Biosciences). 

Electrophoretic IVlobility Shift Assays (EMSA) 

Oligonucleotides were end-labeled with [y-'^P] dATP and T4 
Polynucleotide Kinase (NEB), and purified with G-50 Sephadex 
columns (Roche). EMSA reactions were carried out in 12.5 ul total 
volumes. GST-TBXl (5-25 |a.g) was pre-incubated with 1 |Xg/ul 
poly dl-dC, lOOx unlabeled self-competitor in Aioo buffer (0.5 M 
HEPES, 50% glycerol, 0.5 M EDTA, 1 M MgClj, 1 M KCl, 
O.IM DTT, H2O, proteinase k) for ten minutes at room 
temperature as previously described [42]. Double stranded 
[y-^2pj (^"pp labeled oligonucleotide was added together with 
20% Ficoll, and lOX EMSA buffer (0.5M HEPES, 1 M MgClz, 
1 M KCl, 0.1 MDTT) and incubated for an additional ten 
minutes at room temperature. Reactions were then loaded onto a 
5% acrylamide gel and the protein-DNA complexes were 
electrophoresed at 200 volts for 80 minutes. 

Reporter Constructs 

Two ohgonucleotides containing six copies of the tandem 
repeat-AGGTGTGAAGGTGTGA (6x TR) and half site partial 
site - AGGTGTGATCGCGTCAT (6x 'A SPS) in tandem 
respectively, were generated synthetically from Genscript. The 6x 
tandem oligonucleotides were digested with Xhol and Nhel and 
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subcloned into a pGL3-promoter vector (Promega) [43]. Each 
tandem repeat binding site had a spacer of two random 
nucleotides (6xTR-pGL3p and GxViSPS-pGLSp). After testing a 
concentration gradient of full length T/(x7-pCDNA3.1, we 
determined that 100 ng of this plasmid led to the highest fold 
change (Fc-27) wlic-n compared to the control experiment where 
the empty activating vector (Empty pCDNA3.1) was co-transfect- 
ed with the reporter. Mutated binding sites were generated 
synthetically from Genscript. The AGGTGTGA sequence was 
mutated to AATTTTGA (Mutated 6xTR-pGL3p and Mutated 
6x!/2SPS-pGL3p) [31]. In addition mutations at positions P8 
A^T, PU G^C and P13 G^T in die 'ASPS, were also 
generated (P8 Gx'ASPS-pGLSp; Pll 6x'/2SPS-pGL3p; PI 3 
6x'/2SPS-pGL3p). 

Luciferase Assays 

Jeg3 cells (ATCC) were co-transfected with 25 ng of one 
reporter construct (6x TR-pGL3p; 6x'/2SPS-pGL3p; Mut. 6xTR- 
pGL3p; Mut. 6x 'A SPS-pGL3p; P8 6x'/2SPS-pGL3p; Pll 
6x!/2SPS-pGL3p; and P13 6xV'2SPS-pGL3p), 25-200 ng of 
activating construct (7^x;-pcDNA3.1, F148Y TfaZ-pCDNAS. 1, 
H194Q. 72iA:;-pcDNA3.1, G310S 72>xi-pcDNA3. 1) and 5 ng of 
the internal control pRL TK vector (Promega). Cells were grown 
in 10% FBS in DMEM (Invitrogen), trypsinized and plated onto 
48 well plates. The following day, cells were transected with the 
above constructs together with Lipofectamine LTX (Im itrogen), 
in MEM (Invitrogen). Four hours later, the media was changed to 
10% FBS in DMEM. Luciferase assay readings were carried out 
48 hours later using the Dual Luciferase Reporter Assay System 
(Promega). All data are presented as means ±SD; na3. P-values 
were determined using the Student's t-test. 

Immunofluorescence 

Jeg3 cells were grown in 10% FBS in DMEM, trypsinized and 
plated onto 6-well plates. Cells were transfected 24 hours later 
with 500 ng of Tfei-pcDNAS. 1, F148Y m;-pCDNA3.1, 
H194Q. 7*;c;-pcDNA3.1, or G310S 72>A;i-pcDNA3.1, and 
lipofectamine LTX (Invitrogen). Cells were fixed with 4% 
paraformaldehyde and 4% sucrose 48 hours later, for 15 minutes 
at room temperature. Cells were permeabilized with 0.3% Triton 
X-100 and blocked with 10% BSA/PBS for 30 minutes at 37°C. 
Cells were then incubated with rabbit polyclonal a mouse TBXl 
1:500 (Zymed) for two hours at 37°C and then with Alexa Fluor 
488 goat a mouse IgG (Invitrogen) and DAPI stain (1:500) for one 
hour at 37°C. 

Site Directed Mutagenesis 

The Quick Change Lightning Site-Directed Mutagenesis Kit 
(Agilent Technologies) was used to generate mutations in TBXl. 
Full length T?)x7-pCDNA3.1 was used as a template for the PCR 
reaction. Primers were designed with the nucleotide changes: 
F148Y Sense- 5'-CCCCACGTTCCAAGTGAAGCTTATG- 
GAATGGATCC-3'; H194(i Sense- 5'-CTGGCCGAGTACAG- 
TACCACCCGGACT-3'; G310 Sense- 5'-AACCACCGGCC- 
CAGTGCGCTGCCGCTC-3'. After Dpnl digestion, plasmids 
were transformed into XLIO-Cold Ultracompetent cells and 
plated overnight on LB-ampicillin plates, per instructions supplied 
with kit. Colonies were picked and grown in liquid culture 
overnight. Plasmid DNA was isolated using the Qiagen mini-prep 
kit and subjected to Sanger sequencing. 



Bioinformatic Analysis 

An in-house bioinformatic program was created to search the mouse 
genome for the consensus T-box motif, AGGTG(T/C)(G/T)A, 
identified by the SELEX experiment. These sites were then compared 
to a list of the most conserved elements produced by the phastCons 
database based on whole-genome alignment of placental mammals 
[44] from the UCSC genome browser (http://genome.ucsc.edu). 
Motifs were then assigned to the nearest RelSeq genes and were then 
grouped based on conser\'ation and distance to transcriptional start 
sites (TSS) defined as - 100 kb to +1 kb and - 1 kb to +100 kb. Gene 
ontology software tools, GREAT (Genomic Regions Enrichment of 
Annotations Tool- http://bejerano.stanford.edu/great/public/html/ 
index.php) and DAVID (http://david.abcc.ncifcrf.gov), were used to 
generate functional groups of genes harboring the motifs, by inputting 
the chromosomal positions of the putative TBXl motils. 

Whole-Mount RNA In Situ Hybridization 

Embryos were fixed in 4% paraformaldehyde overnight at 4°C. 
The embryos were then serially dehydrated to 100% methanol 
and stored at — 20°C. On day 1 of the protocol, the embryos were 
rehydrated to lxPBS/.01%Tween-20 and the in situ hybridization 
assay was carried out as previously described (Franco et al., 2001). 
Anti-sense digoxigenin labeled RNA probes to Tlixl and Fgf8 [45] 
were generated from plasmids via standard methods. The Bmper 
probe was generated from templates amplified from El 0.5 mouse 
cDNA using the following primers: Fwd (5'-AGTCCTT- 
GACTTGGCTTATC-3'; Rev (5'-GCACTTGGACATTA- 
TACTTGC-3'). Each RNA template was created by PCR with 
a T3 RNA polymerase binding site at the 5' end and a T7 RNA 
polymerase binding site at the 3' end. Embryos were dissected at 
E9.5 and E10.5. Mice were maintained in a 12 hour dark/ 
1 2 hour light cycle in compliance with the Albert Einstein College 
of Medicine of Yeshiva University Institutional Animal Care and 
Use Committee (lACUC). 

Results 

SELEX Identifies the Optimal Binding Sites of GST-TBX1 

To determine the optimal binding site of mouse TBXl, an in 
vitro selection method termed Systematic Evolution of Ligands by 
Exponential Enrichment (SELEX) was performed. We created a 
GST-TBXl fusion protein containing the T-box DNA binding 
domain [12] and ten amino acids on either side (90-303) (Fig. SI). 
The GST-TBXl protein was able to bind to the Brachyury 
palindromic sequence as determined by electrophoretic mobility 
shift assay (EMSA) (Fig. SI). The validated GST-TBXl protein 
was then subjected to the in vitro SELEX selection method and 
after six rounds of selection, we identified a clearly distinguishable 
protein-DNA complex (Fig. lA). Both protein-DNA complexes 
were specific because binding of the radiolabeled oligonucleotides 
was competed with non-radiolabeled ohgonucleotides obtained 
from round six of selection (Fig. IB). We found two different sized 
protein-DNA complexes in gel assays, suggesting that TBXl binds 
in two conformations (Fig. IB). A total of 60 separate bacterial 
clones containing enriched oligonucleotides were sequenced to 
obtain a consensus sequence. Among them, 55 sequences were 
selected and aligned to generate two 16 bp DNA consensus 
sequences, containing one or two repeated GTGT "core" motifs 
in a tandem orientation (Figs. 1C,D). The GTGT core motif is 
part of the consensus binding site for T-box protein family 
members, indicating overlap between the TBXl motif and that of 
other members. One consensus closely resembles the Brachyury 
half-site of AGGTGTGA. We termed tiiis motif as tiie TBXl-TR 
(Fig. IC). The second consensus sequence is also a tandem repeat. 
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however, the 3' site is comprised from a degenerate sequence with 
only two highly conserved positions (13 and 16, Fig. IC). We 
termed the second consensus sequence, TBXl-'ASPS (half site 
partial site) (Fig. IC). In total, 62% of 55 sequences closely 
resembled the TR site and 34% had the '/2SPS site consensus 
(Fig. ID). 

Two Identified Motifs are Specifically Bound by GST-TBX1 

To test for binding specificity, oligonucleotides were designed 
that contained a single copy of the TR and 'A SPS, where each 
nucleotide corresponds to the most highly selected base at each 
position: TR: 5'-AGGTGTGAAGGTGTGA-3' and 'ASPS: 5'- 
AGGTGTGATCGCGTCAT-3'. The GST-TBXl protein was 
able to bind to both motifs by EMSA (Fig. 2A). Two 
concentrations of protein were tested showing the same proportion 
of protein-DNA complexes (Fig. 2A). Binding was competed with 
lOOX excess non-radiolabeled oligonucleotide of the same 
sequence. After exposing the film for an extended period of time, 
a second more slowly migrating protein-DNA complex appeared 



that was similar to that present in Fig.2A (data not shown). The 
shifted protein-DNA complex using the 'A SPS appeared to be 
weaker in intensity on the gel as compared to the TR (Fig. 2A). In 
the same experiment, the 8bp half-site (T-site), 5'-AGGTGTGA- 
3' and the Brachyury palindrome, 5-TCACACCTAGGTGTGAA- 
3' were also tested. A protein-DNA complex was never observed 
when GST-TBXl was incubated with the half-site and only after 
over exposing the film for 24 hours was a protein-DNA complex 
observed with the fuU Brachyury palindrome (data not shown). To 
further test the specificity of the newly derived motifs, gradients of 
both poly(dTdC) and specific cold-competitor were generated and 
tested by EMSA (Fig. S2A). As the concentration of poly(dl-dC) 
increased, the binding intensity decreased; however, the protein- 
DNA complex was located at the same position in the gel, 
indicating specific binding occurred (Fig. S2A). 

TBX1 Transcriptionally Activates Reporter Genes 

To test whether TBX 1 could activate transcription of a reporter 
by binding to the newly identified consensus sequences, we 
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Figure 1. SELEX-Selection of specific oligonucleotides bound to GST-TBX1. A: A pipeline illustrating the SELEX method is shown. The 
dsDNA was generated by PCR of the selected oligonucleotides at each found and incubated with GST-TBXl. A total of 6 rounds of selection was 
performed. B: ElVlSA was used to detect specific GST-TBXl and [oi-32P]dCTP labeled DNA complexes at 0, 2, 4 and 6 rounds of selection, with or 
without cold competitor (R6, cold PCR products from round 6; T, ds DNA harboring the published Brachyury half site). C: Sequence alignment shows 
that the optimal DNA binding motif for TBX1 is AGGTGT(G/T)(AAr) followed by two repeated similar motifs termed the Tandem Repeat (TR) and Half 
Site Partial Site as shown (ViSPS). D: Distribution of sequences with different consensus binding motifs within the pool of oligonucleotides after 6 
rounds of selection (total number =60). 
doi:1 0.1 371 /journal.pone.00951 51 .gOOl 
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Figure 2. Specificity of new binding sites tested via EMSA and luciferase assays. A: The binding sites identified from the SELEX experiment 
were tested with GST-TBX1 separately by ElVlSA to determine if there is specific binding (Tandem Repeat, TR; Half Site Partial Site, 'ASPS, half site; 'A 
Site). For comparison, the palindromic T-site was also tested but binding was very weak as compared to the newly identified binding sites and was 
only observed after extensive overexposure of the autoradiogram (not shown). B: Luciferase reporter constructs containing 6 copies (6x) of the TR 
and 6x of the 'ASPS, respectively, were co-transfected with full length 76x!-pCDNA3.1 and compared to the empty pCDNAB.I transfection to 
determine if TBX1 could activate transcription of a reporter via these sites. A significant increase of luciferase activity was observed in the presence of 
full-length TBX1 for both the 6x TR and the 6x 'ASPS when compared to transfection of the empty pCDNA3.1 vector (TR: 29 fold; Students t test, ♦p< 
0.001; 'ASPS: 5.6 fold; Students t-test, *p<0.02).The mutations analyzed were those previously tested in a half site where AGGTGTGA was mutated to 
AATTTTGA [31]. When these nucleotide changes were present in the TR, there was a dramatic decrease in activation by Tbxy-pCDNAS.I (7.4 fold; 
Students t test, ♦ p<0.001). The same mutation in the 6x'ASPS construct did not show a significant change when compared to the normal 'ASPS 
(n.s. not significant). All data are presented as means ±SD; n>3. 
doi:1 0.1 371 /journal.pone.00951 51 .g002 



performed luciferase assays using Jeg3 cells. Jeg3 cells have 
successfully been used previously to test TBXl activation of 
reporter constructs harboring endogenous gene loci suggesting 
that it has the necessary co-factors for TBXl to bind and regulate 
transcription [21]. Reporter constructs containing six tandem 
copies of the TR or 'ASPS were generated and tested in luciferase 
assays (6xTR-pGL3p and Sx'ASPS-pGLSp). After testing the fuU 
length TBXl protein in pCDNAS. 1, at varying concentrations, we 
found that 100 ng of the expression vector yielded the highest fold 
change (27 fold) when compared to empty pCDNA3. 1 vector (data 
not shown). Based upon this, we used 100 ng of the Tbxl- 
pCDNA3.1 construct for all subsequent luciferase assays. The 
consensus sites and mutated versions of these sites (AGGTGTGA 
to AATTTTGA) [31], were simultaneously evaluated in the same 
experiment. As a control, for each binding site reporter construct, 
the Tiiix7-pCDNA3. 1 transfection was compared to the simulta- 
neous transfection of empty pCDNAS.l vector. The mutated 
6xTR-pGL3p showed a dramatic decrease in transcriptional 
activation when compared to the wild-type (WT) reporter 
construct (Fig. 2B). The 6x 'A SPS-pGL3p also showed activation 
in the presence of 77(xi-pCDNA3.1 and this activation was only 
partially disrupted when the binding site was mutated (Fig. 2B). 

Surrounding Nucleotides Outside the Half Site are Crucial 
For Binding 

To demonstrate the importance of the 3' half of the TBXl -1/2 
SPS site and to further define essential nucleotides for binding, we 



generated various mutations of position 8 (P8), 11 (Pll) and 13 
(PI 3) as these nucleotides seemed to vary the most when 
comparing complex intensities on gel shift assays (data not shown) 
(Fig. 3A). When the nucleotide at P8 was mutated from an A-^T, 
binding of the lower, main protein-DNA complex (complex 1; 
Fig. lA) was abolished. In contrast, the upper less prominent 
protein-DNA complex in the gel (complex 2; Figure lA) remained 
vmchanged (Fig. 3A). Mutation at Pll from a G^C nucleotide 
resulted in reduction of the upper protein-DNA complex (Fig. 3A). 
When the PI 3 nucleotide was mutated from a G^T, binding of 
GST-TBXl was lost (Fig. 3A). Effects of these three mutations in 
the 2"^' half-site demonstrate the importance of these nucleotides 
in binding. We then proceeded to test these mutations in luciferase 
reporter assays in cell culture. We generated luciferase reporter 
constructs containing six copies of the mutated 'ASPS at P8, Pll 
and PI 3. For all three, we observed reduced activation when 
compared to the WT 'ASPS consensus sequence. We concluded 
that the surrounding nucleotides are necessary for the activation of 
the reporter in tissue culture (Fig. 3B; statistical values in Figure 
legend). 

IVlutations in TBXl Lead to a Decrease in Activation 

Human mutations in TBXl have been previously identified in a 
subset of patients with VCFS/DGS but with no deletion. These 
mutations, including F148Y, H194Q,and G310S, were previously 
tested in transcription reporter assays in cell culture using the 
Brachyury palindrome sequence to determine whether they 
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Figure 3. Surrounding nucleotides outside of the 'A Site are crucial for binding. A: ElVISA was performed on mutated sequences that were 
generated in the second 'A site to test if variation at position, P8 (A^T), P1 1 (G^C) and PI 3 (G^T) affects binding. When P8 was changed, the faster 
migrating binding conformation was lost. When P1 1 was changed most of the slower migrating binding conformation was lost. Almost all binding 
was completely lost when PI 3 was changed. B: Luciferase reporter assays were performed to test their effect on transcription. Constructs harboring 6 
copies of the mutated binding site (mutated at either P8, 11 or 13, respectively) were co-transfected with the full length Tbxl gene. Mutation of these 
nucleotides affected luciferase activation. Data are presented as means ± SD; n2:3. Student's t-test, *p<0.02. 
doi:1 0.1 371 /journal.pone.00951 51 .g003 



altered transcription [19-21]. Increase in transcriptional activation 
with F148Y, H194C) and G310S mutations versus WT TBXl 
using Jeg3 cells was previously observed [21]. To further test this 
using the new consensus sequences, we generated the same point 
mutations in GST-TBXl and evaluated their DNA-binding and 
transcriptional activation in cell culture (Fig. 4A, 4B). Protein- 
DNA complexes were formed at the same position as for the WT 
protein (shift 1; Fig. lA). Binding to DNA was similar with both 
consensus sequence probes as determined by EMSA (Fig. 4C and 
4D). As for the wild-type protein, mutant TBXl proteins cotdd not 
bind to the half site (data not shown). In addition, luciferase 
reporter assays were carried out to determine if these mutations in 
TBXl could lead to a change in the activation of the reporter 
constructs harboring the 6xTR or Bx'ASPS. Cells were initially 
transfected with the test construct (WT or mutated TBXl) and the 
reporter construct. Mutated protein activation values were 
compared to the WT protein values. We observed a statistically 
significant decrease in activation in the presence of two mutations, 
F148Y and G310S; more dramatically with the F148Y mutation, 
which showed no activation when compared to WT values 
(statistical values are presented in the Figure legend). Interestingly, 
we did not observe any change in activation when we tested the 
H 1 94Q^ mutation. Because these are heterozygous mutations in 
human patients, a reporter assay was carried out to test whether 
adding in one wild type copy of Tbxl would suppress the effect of 
the mutated allele. Jeg3 cells were co-transfected with 50 ng of 
72i;c;-pCDNA3.1 and 50 ng of either F148Y, H194Q,or G310S 



77)x7-pCDNA3.1 as we described earlier. Although there was a 
slight increase in activation in the presence of the WT protein with 
the G310S mutant protein, these values were stUl lower for the 
F 1 48Y mutations when compared to the WT protein. Again, there 
was no change observed when the WT TBXl protein was co- 
transfected with the H194Q_ mutant protein. (Fig. 4E). Because 
these recombinant proteins are not endogenously expressed in 
Jeg3 cells, we examined whether the ectopic TBXl proteins were 
localized to the nucleus and not in the cytoplasm. Immunofluo- 
rescence was performed to visualize this set of four proteins and 
their nuclear localization was confirmed, using DAPI as a nuclear 
stain (Fig. 4F). We concluded that transfection conditions that 
mimic TBXl haploinsufficiency due to the F148Y mutation 
resulted in reduced activation of both reporter constructs. 

In Silico Genome-Wide Screen for T-sites in the Mouse 
Genome 

A series of bioinformatic approaches were undertaken to identify 
potential direct downstream transcriptional target genes by 
examining annotated mouse genome sequence data (UGSC 
genome browser, mm9). The first screen was done to detect 
binding sites in blocks of evolutionarUy conserved sequences. A total 
of 235,414 sequences matching half sites were found. Among them, 
12,659 (5.4%) half sites were found to overlap with conserved 
elements (Fig. 5A) These were then assigned to the nearest RefSeq 
genes within +/ — 100 kb distance of transcriptional start sites (TSS). 
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mutagenesis, the wild type (WT) and mutant proteins were induced, purified and used for EMSAs. The protein-DNA complexes are shown before and 
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after IPTG induction. C, D: EMSAs of the WT and three mutated proteins using the radiolabeled TR (C) and '/2SPS motifs (D). E: Luciferase assays were 
performed after co-transfecting the reporter construct (6xTR or 6x Vi SPS) with WT or each mutated full length Tbx7-pCDNA3.1 construct. The F148Y 
mutation led to a decrease in activation (TR- 17 fold decrease, ♦p<0.0005; '/2SPS- 4.5 fold decrease, ♦p<0.0003) when compared to the WT 
transfection. The H 1 94Q mutation did not lead to a statistically significant change in activation but the trend was in the direction of decreased activity 
(not significant- n.s). The G310S mutation led to a smaller but still significant decrease in activation (TR- 2.2 fold decrease, ♦ p<0.01; ViSPS- 1.3 fold 
decrease, ♦ p<0.05). Equal amounts of WT and mutated Tbxl was co-transfected with the respective reporter constructs to determine if there was 
suppression of the mutant phenotype. There was a slight increase in activation when compared to the mutated F148Y alone transfection (TR-4 fold, 
•p<0.006; '/2SPS- 1.9 fold, •p<0.006). Under these new conditions, the F148Y mutated TBX1 with WT protein still showed reduced activation when 
compared to WT TBX1 (TR- *p<0.0003; '/2SPS- *p<0.006). The HI 94Q+WT combination did not show any significant change (TR- p<0.4; 72 SPS- p< 
0.1). The G310S+WT combination showed a significant increase in activation (TR- 2 fold, •p<0.001; V2SPS 1.5 fold, •p<0.02) when compared to 
G310S mutant alone. All data are presented as means ±SD; n>:3. p-values were determined using the Student's t-test. F: Immunoflourescence 
experiments were performed with antibodies to TBX1 on transfected Jeg3 cells to valdiate that the mutated constructs were localized to the nucleus 
(green). Nuclear localization was confirmed by observing expression in DAP! stained nuclei shown in blue. 
doi:1 0.1 371/journal.pone.0095151 .g004 



The first screen identifi(;d a total of 187 with matches to the half site 
consensus sequence of AGGTG(T/ C)(G/T)A within highly con- 
served elements (Logarithm of the Odds Score-LOD>500) (Motifs 
and corresponding gene names can be found in Table SI). We 
found 425 motifs within moderately conserved elements (LOD 
score of 200-500) (Table SI). These could be putative binding sites 
for any T-box gene. We then searched the half sites that contained 
the second partial site, to identify ViSPSs, which would be more 
selective for binding of TBXl . We also examined whether any of the 
half sites had a second direct tandem repeat. None of the sites within 
evolutionarily conserved blocks from the search above had a second 
direct repeat (TR). Therefore, a second bioinformatic screen was 
done to search for TR sites anywhere in the genome, irrespective of 
evolutionsuy conservation (Fig. 5B). A total of 302 TR sites were 
found throughought the genome (Table S2). We searched for gene 
ontolog}' groups for all of the genes harboring putative T-half sites 
and TR sites. Most of the groups that were identified were those 
involved in embryonic develpmental processes and mRNA 
transcription regulation (Fig. 5C,D) [46,47]. We then examined 
each gene for their known function or expression pattern, using 
literature and the MGI JAX database (www.informatics.jax.org) to 
ascertain whether any could be a putative TBXl downstream 
transcriptional target (data not shown). 

Candidate TBXl binding sites near genes with known 
expression patterns in mouse embryos similar to that of Tbxl or 
with phenotypes similar to that in Tbxl'^' mutant embryos were of 
particular interest to pursue. To further narrow the list of possible 
genes regulated by Thxl, we focused on those containing 'A SPS or 
TR sites. We first checked whether the expression of any were 
altered in previous gene expression arrays in experiments where 
Tbxl^^^ versus Thxl''' embryonic tissues were compared [26,28]. 
Twenty-seven genes were initially selected to determine if GST- 
TBXl could bind. 

Electrophoretic mobility shift assays were performed on motifs 
near candidate downstream target gene loci to determine if GST- 
TBXI could bind to tiiem. GST-TBXl formed protein-DNA 
complexes with 19 of the 27 motifs with three distinct intensities of 
protein-DNA complexes referred to as high (similar to TBXl-TR), 
medium and low (Table S3). The position of the complexes in the 
gel were all the same, suggesting similar binding conformations. 
Reporter constructs were generated to include the motif to be 
tested with approximately 200 bp flanking either side. Of the 19 
motifs tested for binding, we chose three of the strongest binding 
candidates for additional studies: Bmper and the Otog-MyoD 

locus. 

The Fgf8 gene encodes a secreted fibroblast growth factor (FGF) 
that is required for craniofacial and heart development [45]. 
Relevant to Tbxl, a genetic interaction between Fgf8 and Thxl has 
been found [29,45]. The FgfS locus has a V2 SPS located 4 kb 
downstream of the transcriptional stop site (Fig. 6A). This site falls 



in a highly evolutionarily conserved sequence block, across 
mammals and vertebrates. Interestingly this is a known FgfS 
regulatory region for somite and tail bud mRNA expression, 
conserved from zebrafish to mouse [48,49]. GST-TBXl was able 
to bind to the '/2SPS motif (Fig. 6B) and this was at a similar 
intensity as compared to the consensus 'A SPS (AGGTGT- 
GATCGCGTCAT) (data not shown). As expected from tiie 
EMSA, the transcription reporter assay in Jeg3 cells showed 
activation at a level similar to the V2 SPS consensus (5 fold change) 
(Fig. 6C). Whole mount RNA in situ hybridization comparing 
Thxl^^^ and Thxl'^' embr)'os at El 0.5 shows a decrease in 
expression of FgfS in the pharyngeal arch endoderm (Fig. 6D) as 
previously reported [29,50]. We did not detect a change in somite 
or tail bud expression, suggesting possible functional redundancy 
with other T-box genes with similar expression patterns. 

The Bmper gene encodes a secreted protein that inhibits bone 
morphogenetic protein (BMP) function. The Bmper locus has a TR 
binding site in the intron lying between exons 13 and 14, sharing 
evolutionary conservation only with rat and opossum (Fig. 7A). 
GST-TBXl was able to bind strongly to the 40 bp element 
harboring the motif (Fig. 7B) and this was similar to that of the TR 
consensus (data not shown). There was a small 1.8 fold increase in 
transcription activation in the presence of TBXl (Fig. 7C). We 
suggest that the small fold activation here compared to the 
experiments using the 6x TR consensus sequence, could be due to 
the fact that we used a 400 bp sequence element that might harbor 
inhibitory sites surrounding the single TBXl binding site. Bmper 
expression is lost in part of the first pharyngeal arch in Thxl 
embryos at El 0.5, and expression in the inner ear is altered as well 
(Fig. 7D) suggesting that it could be a direct downstream 
transcriptional target. 

The third site that was evaluated was a TR site in the Otog gene 
in intron 53 (of a total of 56 exons). The Otog gene encodes an N- 
glycosylated protein present in the aceUular membranes of the 
sensory epitheha patches of the inner ear, important for hearing 
[51]. Otog and MyoD are neighboring genes, however the TR motif 
within the regions tested in EMSA and luciferase assays, is 70 kb 
from the AfyoD TSS (Fig. 8A). The GST-TBXl fusion protein can 
bind to the Otog-MjoD sequence block containing the TR motif, 
and it was competed with unlabeled DNA of the same sequence. 
Although binding appeared to be strong by EMSA, transcription 
was only activated 2.5 fold (Fig. 8B and 8C). The MyoD gene, 
encoding a basic helix-loop-helix myogenic regulator)' transcrip- 
tion factor hes adjacent to Otog. We were not able to generate a 
specific probe for in situ hybridization analysis of Otog. As has been 
reported, RNA expression of MyoD is lost in the first pharyngeal 
arch core mesoderm in Thxl''' null mutant embryos [52] (Fig. 8D). 
We conclude that this endogenous TR site is a possible candidate 
for MyoD regulation by TBXl. 
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Figure 5. Genome wide search of T-sites. A, B: Representative examples depicting the genome-wide search for 'A SPS and TR motifs. The first 
search was done for the Vi site in evolutionarily conserved blocl<s within 1 00 l<b upstream and downstream of the TSS of genes. The second search 
for the TR was expanded since there were fewer sites (303), irrespective of distance to the TSS and conservation across mammals or vertebrates. C, D: 
Bar graphs depict the number of genes comprising top gene ontology categories using the Database for Annotation, Visualization and Integrated 
Discovery (DAVID v6.7- http://david.abcc.ncifcrf.gov). 
doi:1 0.1 371 /journal.pone.00951 51 .g005 



Discussion 

The T-box family of transcription factors is important in 
vertebrate development and human disease. The preferential 
binding site of a number of T-box proteins, including Brachyury, 
TBX2, TBX5, TBX6, TBX15 and TBX18 were previously 
identified by taking either in vitro or in vivo approaches [4,12,14— 
16]. Most can bind as monomers to a Brachyury consensus half 
site, or as dimers to a palindrome, while few can also bind to a 
tandem repeat. In this study we carried out an in vitro selection 
method (SELEX) to identify the preferential binding site of 
mammalian TBXl. We found two classes of binding sites; a 
perfect direct repeat (TR), consisting of two classic Brachyury half- 
sites and a second, imperfect direct repeat ( V2SPS), in which the 5' 
site is similar to the Brachyury half site, but the 3' half is different. 
One important possibility is that there are differences in the amino 
acid constitution of TBXl that confers a different binding 
preference as some of the other T-box proteins. For example, 
TBXl appears to bind strongly to the TR, but weakly to the 
Brachyury palindrome and not at all to the half site motif 
Although the DNA binding domain is highly conserved amongst 
different T-box proteins, some difiFerences may contribute to 
specificity of binding and the sequences might affect the 
orientation in which various T-box proteins bind to DNA. 
Interestingly, a few amino acids that are important for Brachyury 
dimers to bind to the palindrome are not conserved in TBXl 
[20,53]. This may explain the difiFerence in binding preference. 
Using the Brachyury crystal structure, amino acids important for 
both the dimerization and DNA binding have been mapped [53]. 
Six amino acids important for binding and dimerization, 
distributed throughout the protein, are different between Brachy- 
ury and TBXl. Three of these amino acids are important for 



dimerization (M87D, N131A, F132K) and three important for 
DNA binding (K103R, K151N, A216G; Brachyury to TBXl cid 
change respectively). Perhaps, these differences at crucial positions 
leads to a secondary structure conformational change allowing 
TBXl molecules to bind preferably in a head to tail orientation. 
The TBXl protein and DNA crystal structure has been published, 
but this was done so using the palindromic Brachyury binding site 
[54]. This group found that two TBXl proteins can bind as 
monomers to the palindromic sequence. In our gel shift assays, 
TBXl and Brachyury formed similar sized protein-DNA com- 
plexes (data not shown), suggesting that TBXl might bind as a 
dimer to the TBXl TR, since it consists of two half-sites. Now that 
the TBX 1 TR has been identified as the preferential binding site, a 
new crystal structure might lead to further understanding of key 
residues of TBXl required for binding to DNA. 

We found some inconsistencies between apparent binding 
affinities to DNA in EMSAs versus transcriptional activity in 
luciferase assays in cell culture. For example, although the PI 3 
mutation in the 'A SPS led to the greatest loss of binding in vitro, it 
had the least effect on transcription (2.3 fold decrease). One 
possible reason is that we used only the DNA binding domain of 
TBXl for EMSAs but used the full-length protein for luciferase 
assays. There are multiple examples where there is a lack of direct 
correlation between relative binding affinity and transcriptional 
activation of a cis-acting motif [55]. For example, the ETS-1 DNA 
binding domain (DBD) undergoes minimal secondary structural 
changes in the presence of DNA, but the fuU length protein 
binding to DNA does induce changes in secondary structure at a 
distance from the protein-DNA interference [56]. Glucocorticoid 
receptor binding also affects structure and activity of the protein 
on DNA where stronger activating downstream sites bound 
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Figure 6. The FgfS locus has a VSSPS that is bound and activated by TBX1. A: Snapshot from the UCSC Genome Browser showing the 
position of the 'ASPS, located 4 l<b downstream of the FgfS gene. The region is ultraconserved from humans to fugu. B: ElVlSA for the 40 bp element 
in the FgfS locus harboring the 'ASPS motif with GST-TBX1 . Lanes with unlabeled competitor is shown. C: The 400 bp element within FgfS locus was 
subjected to luciferase assays in cell culture and was activated in the presence of 7"bx/-pCDNA3.1 (5-fold; Students t- test *p<0.002; data are 
presented as means ±SD; n>3). D: Whole mount in situ hybridization of FgfS antisense mRNA in Tbxl*^* (left) versus Tbxl''' (right) mouse embryos at 
El 0.5. Expression of Tbxl is reduced in the distal pharyngeal apparatus (arrow), but remains in the rest of the embryo (first pharyngeal arch, head, 
limb buds, somites). 
doi:10.1371/journal.pone.0095151.g006 



equally in gel shift assays as those more weakly activated in 
luciferase assays [57]. It was also noted that changing even one 
nucleotide in the binding sequence could affect the binding and 
transcriptional activation. DBDs are not only important for 
protein-DNA interactions, but for protein-protein interactions as 
well. Perhaps GST-TBXl, in the presence the P13 binding site, 
has a more open interface allowing it to interact with other co- 
factors that provide for a more stable activation of transcription as 
opposed to the P8 or Pll nucleotide changes. 

Mutations in TBX1 

Previous studies have tested whether TBXl mutations have an 
impact on transcriptional regulation of reporter constructs, but 
these used palindromic sequences as the binding motif, but these 
had conflicting results [20,21]. Since the binding consensus sites 



identified in the SELEX assay had a roughly, 20 fold increase in 
binding and transcription activation by TBX 1 , we reasoned that it 
would provide a more sensitive indicator of any change in binding 
or transcription by missense changes in TBXl. Previous studies did 
not perform any in vitro binding assays to DNA. We found that the 
F148Y, H194Q_and G310S mutant proteins could strongly bind 
to the two consensus sequences we identified (TR and 'ASPS sites). 
In contrast to what has been previously reported, where the F148Y 
and G310S mutations showed no effect on transcription [19] or 
the three (F148Y, H194Q, G310S) showed an increase in 
transcription of reporters using the palindrome site [21], we 
found a decrease in reporter gene activation, in particular for the 
F 1 48 Y mutation. This suggests that activation or repression might 
be DNA binding motif-specific. One possible explanation of this 
difference with previous reports [19-21] is that the new consensus 
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Figure 7. The Bmper\ocus has an intronic TR site that is bound and activated by TBXl. A: Snapshot of the UCSC browser showing the TR 
site in intron 13 in the Bmper locus and it is evolutionarily conserved in some rodent species. B: GST-TBX1 was able to bind to the TR motif in the 
Bmper locus. C: Luciferase assay results demonstrate a 1.8 fold increase (Students t-test *p<0.003; ±SD; naS) in activation when 7"bx/pCDNA3.1 was 
cotransfected with the 400 bp element. D: Whole mount in situ hybridization of Bmper antisense mRNA in Tbxl*^* (left) versus TbxV'' (right) embryos 
at El 0.5. Expression of Jbxi is reduced in the core mesoderm of the pharyngeal apparatus (long arrow) and otic vesicle (short arrow), but remains in 
the somites. 

doi:1 0.1 371 /journal.pone.00951 51 .g007 



sequence(s) provides a greater sensitivity in measuring clianges in 
the mutant proteins. Based on the publislied TBXl crystal 
structure, F148 and HI 94 are neither involved with DNA binding 
or dimer formation, but it was noted that F 1 48 lies at the surface of 
the protein [54]. The authors explain that this residue then may 
have an effect on protein-protein interactions with other co-factors 
necessary for transcription to occur [54]. This coincides with our 
data in which the F148Y mutation does not affect binding to DNA 
(Figs. 4C, D), but does lead to a loss of activation in reporter assays 
(Fig. 4E). 



In addition to understanding the effects of mutations on gene 
function, one major goal is to identify direct transcriptional target 
genes required for embryonic development. Using various 
bioinformatic selection methods, we identified and validated 
DNA binding to 19 different motifs present in gene loci of 
interest, and confirmed transcriptional activation for 1 1 of them 
(data not shown), including FgfS, Bmper and Otog-MyoD. 

Fgf8 

The pharyngeal apparatus is an embryonic structure that 
becomes remodeled to form the face, neck and cardiac outflow 
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Figure 8. TBX1 can bind to a TR site in the Otog-MyoD \ocus. A: Snapshot from tine UCSC genome browser showing theTR site in intron 13 of 
the Ofog gene and 77 l<b downstream from the TSS of MyoD. B: EMSA of GST-TBX1 and the TR motif in the Otog-MyoD locus with or without 
unlabeled TR double stranded oligonucleotide competitor. C: Luciferase reporter assays using the intronic element showed a 2.5 fold increase 
(Student's t-test, *p<0.05; means ±SD; n>3) in activation when in the presence of TBX1. D: Expression of MyoD is lost in the l" pharyngeal arch core 
mesoderm in TbxV'' embryos at El 0.5 (arrow) but it remains in the somites. Forebrain expression in both embryos represents a staining artifact. 
doi:1 0.1 371 /journal.pone.00951 51 .gOOB 



tract [58]. Tbxl is expressed in mouse embryos in the endoderm 
and mesoderm of the pharyngeal arches as weU as the ectoderm of 
the distal pharyngeal apparatus, with some localized expression in 
the somites [26,45]. Tbxl and FgfS are coexpressed in the 
pharyngeal endoderm and a subset of the pharyngeal mesoderm 
and they genetically interact in mouse embryos, implicating them 
in the same genetic pathway [29] [50]. An evolutionarUy 
conserved element has been identified downstream of the FgfB 
gene locus and drives expression of a reporter in forebrain, somites 
and tail bud but not the pharyngeal apparatus [48] . Thxl and Fgf3 
are also expressed in the somites and presomitic mesoderm (PSM). 
However, inactivation of Tbxl does not restolt in loss of FgfB 
expression in these tissues nor does it affect development of these 
structures. The most parsimonious explanation is that Tbxl acts 
redundandy with other T-box genes, upstream of FgfB. There are 



several T-box genes expressed in the somites and tail bud [59-61] 
and one of these may in fact regulate transcription, possibly 
Brachyury, which can also bind to direct repeats or TBX6 whose 
preferential binding site has some resemblence to the '/zSPS [15]. 

Bmper, Otog-MyoD and transcription regulation 

Although many known regtilatory regions show evolutionary 
conservation, not all follow this pattern. We found putative TBXl 
protein binding sites in the Bmper and Otog-MyoD gene loci, 
however, they are not in regions of high evolutionary conservation. 
There have been a number of reports examining regulatory 
regions that are not in conserved elements. For example, many of 
the p300 sites found by ChlP-Seq (chromatin immunoprecipita- 
tion followed by next-generation sequencing) were not in 
evolutionarily conserved blocks, however, they did drive expres- 
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sion of reporters in vivo [62] . The same was true for PHOX2B 
direct downstream target genes found in zebrafisli [63]. Recent 
investigation of specific binding by liver transcription factors in five 
vertebrate species have shown that occupancy of a small minority 
(10%— 22%) of the binding sites on DNA is conser\'ed among 
mammalian species [64]. Non-conserved, biologically functional 
enhancers have also been identified upstream otpaxQ and otxlb in 
zebrafish [65]. Changes in transcriptional programs through the 
changes within non-conserved regions are supposed to drive 
evolution [66]. Therefore, we suggest that the TR sites warrant 
careful investigation. To this regard, we analyzed such sites in the 
Bmper and the Otog-MyoD loci. 

The core mesoderm of the j)liar5'ngeal arches form the muscles 
of the craniofacial region and neck, required for chewing and 
swallowing [58] . We found that the Bmper gene, encoding a BMP 
antagonist [67-69] is strongly expressed in the central core 
mesoderm region of the pharyngeal arches. The first pharyngeal 
arch forms, but the distal arches do not form in 'Thxl ' embryos. 
We found that Bmper mRNA expression is lost in the first 
pharyngeal arch in Thxl''' embryos. 

Similar to Bmper, another gene, MyoD is also expressed in the 
core mesoderm of the pharyngeal arches and it is also reduced in 
expression in Thxl''' embryos. Inactivation of MjoD and other 
basic helix-loop-helix regulatory transcription factors, results in 
loss of craniofacial muscle formation [70,71]. Similarly, inactiva- 
tion of Tbxl results in loss of development of craniofacial muscles 
[52,72]. The motif we identified is within the Otog gene body, but 
near the 3' end of the gene. Otogelin (Otog) encodes a glycoprotein 
present in the aceUular gelatinouses structures covering the sensory 
epithelia of the inner ear [51,73]. It is known that Otog is expressed 
in the inner ear as early as ElO. Mutations in Otog lead to 
autosomal-recessive sensorineural nonsyndromic hearing loss, 
showcasing the importance of this protein in inner ear develop- 
ment and hearing [74]. Unfortunately, we were not able to 
generate an RNA anti-sense probe to Otog to determine if it is co- 
localized with TAxi. On the other hand, it is possible that this site is 
important for regulation of MyoD expression in vivo, that these 
sequences are important for regulation of Otog, or neither. Nearby 
enhancers within neighboring genes have been found to regulate 
genes at a distance. This is the case for DIx5, with two enhancers 
being exons of the neighboring gen(; Dyncl/1 [75]. Only future 
Chip and in vivo reporter assays in transient transgenic mouse 
models can validate this hypothesis. In conclusion, we used an in 
vitro SELEX selection process to identify two novel TBXl 
consensus sequences, the TBXl-TR and the TBXl- 'ASPS. We 
found that TBXl can activate reporter constructs harboring the 
newiy identified binding sites in tissue culture. In addition, we have 
also demonstrated that in the presence of the F148Y human 
mutation in TBXl, activation of reporter constructs was strongly 
diminished. This was only possible having a highly active 
consensus site for transcription reporter assays in cell culture. 
Finally, as a prelude to future ChlP-seq and other biochemical 
studies, we provide an in silico list of possible direct downstieam 
target genes, some of which may be biologically relevant to TBXl 
function, such as Fgf8, Bmper and Otog-MyoD. 
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Supporting Information 

Figure SI Cloning of GST-Tbxl (T-box) Construct. A: 

The T-box region (amino acids 90-303) of mouse Tbxl was PCR 
amplified from cDNA with flanking EcoRI and Xhol restriction 
enzyme sties. These sites were used to subclone the fragment into 
tiie pGEX4t3 vector (GE Healthcare) to generate a GST-TBXl 
fusion protein. B: GST-TBXl was detected via western blot with 
an Q( GST antibody, with an approximate molecular weight of 
52 kD. C: EMSA witii recombinant GST-TBXl (90-303) binds to 
published palindromic Brachyury palindrome motif [12]. Protein 
dilution, 1:1 and exposure time, 6 hrs. Probe: CTAGATTTCA- 
CACCTAGGTGTGAAATCTAG. 
(TIF) 

Figure S2 Testing the binding specificity of TBXl 
binding motifs. Gradients of both poly dl-dC (left) and specific 
cold-competitor (right). As the concentration of the poly dl-dC 
increased (0.5-2 (Xg), the binding decreased in intensity but the 
creation of protein-DNA complexes stiU occurred at the same 
position. Increasing amounts of specific cold-competitor (25x- 
200x) was used to demonstrate the specificity of binding. 
(TIF) 

Table SI Sites and nearest genes with T half site. The 

sites listed below are half-sites that lie in e\ olutionarily conserved 

regions across the mouse genome (mm9). 

PCLS) 

Table S2 Sites and nearest genes with tandem repeat 
site. The sites listed below are TR sites across the mouse genome 
(mm9), irrespective of evolutionary conservation. 

(XLSX) 

Table S3 TR and 'A SPS endogenous sites in the mouse 
genome. The sites listed above are the endogenous sites that were 
bound by GST-TBXl in EMSA experiments. Listed as well is the 
distance to the TSS and comparative gel shift band intensity. *Gel 
shift bands were compared to the binding of TBXl to the TR 
TBXl site (High). 
(TIF) 
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